Search CORE

339 research outputs found

Estimating underlying articulatory targets of Thai vowels by using deep learning based on generating synthetic samples from a 3D vocal tract model and data augmentation

Author: Birkholz P
Lapthawan T
Prom-On S
Xu Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/04/2022
Field of study

Representation learning is one of the fundamental issues in modeling articulatory-based speech synthesis using target-driven models. This paper proposes a computational strategy for learning underlying articulatory targets from a 3D articulatory speech synthesis model using a bi-directional long short-term memory recurrent neural network based on a small set of representative seed samples. From a seeding set, a larger training set was generated that provided richer contextual variations for the model to learn. The deep learning model for acoustic-to-target mapping was then trained to model the inverse relation of the articulation process. This method allows the trained model to map the given acoustic data onto the articulatory target parameters which can then be used to identify the distribution based on linguistic contexts. The model was evaluated based on its effectiveness in mapping acoustics to articulation, and the perceptual accuracy of speech reproduced from the estimated articulation. The results indicate that the model can accurately imitate speech with a high degree of phonemic precision

UCL Discovery

Posh accent and vocal attractiveness in British English

Author: Birkholz P
Hsu C
Jiao L
Wang C
Xu Y
Publication venue: ExLing2017
Publication date: 01/12/2017
Field of study

Posh accent in British English is associated with upper class. Previous research on poshness has been centred on vocabulary, grammar and phonology, but little is known about the phonetic properties. This study, as part of a larger project, is an attempt to connect posh accent with attractiveness of voice through a common set of dimensions originating from emotional prosody research. Using VocalTractLab and Praat, we created stimuli varying in voice quality, nasality, formant shift ratio, pitch shift and duration. Results of two separate perception experiments showed that only voice quality and formant shift ratio functioned significantly. Breathy voice sounded the most posh and attractive, and pressed voice the least. Likewise, utterances with the smallest formant shift ratio sounded the most posh and attractive

UCL Discovery

Human vocal attractiveness as signaled by body size projection

Author: Birkholz P
Lee A
Liu X
Wu W-L
Xu Y
Publication venue
Publication date: 01/01/2013
Field of study

Voice, as a secondary sexual characteristic, is known to affect the perceived attractiveness of human individuals. But the underlying mechanism of vocal attractiveness has remained unclear. Here, we presented human listeners with acoustically altered natural sentences and fully synthetic sentences with systematically manipulated pitch, formants and voice quality based on a principle of body size projection reported for animal calls and emotional human vocal expressions. The results show that male listeners preferred a female voice that signals a small body size, with relatively high pitch, wide formant dispersion and breathy voice, while female listeners preferred a male voice that signals a large body size with low pitch and narrow formant dispersion. Interestingly, however, male vocal attractiveness was also enhanced by breathiness, which presumably softened the aggressiveness associated with a large body size. These results, together with the additional finding that the same vocal dimensions also affect emotion judgment, indicate that humans still employ a vocal interaction strategy used in animal calls despite the development of complex language

CiteSeerX

Directory of Open Access Journals

UCL Discovery

PubMed Central

Publikationsserver der RWTH Aachen University

HKU Scholars Hub

FigShare

Estimation of Pitch Targets from Speech Signals by Joint Regularized Optimization

Author: Birkholz P
Schmager P
Xu Y
Publication venue: European Signal Processing Conference (EUSIPCO)
Publication date: 03/12/2018
Field of study

This paper presents a novel method to estimate the pitch target parameters of the target approximation model (TAM). The TAM allows the compact representation of natural pitch contours on a solid theoretical basis and can be used as an intonation model for text-to-speech synthesis. In contrast to previous approaches, the method proposed here estimates the parameters of all targets jointly, uses 5th-order (instead of 3rdorder) linear systems to model the target approximation process, and uses regularization to avoid unnatural pitch targets. The effect of these features on the modeling error and the target parameter distributions are shown. The proposed method has been made available as the open-source software tool TargetOptimizer

UCL Discovery

Articulatory Synthesis for Data Augmentation in Phoneme Recognition

Author: Birkholz P
Gerazov B
Krug PK
van Niekerk DR
Xu A
Xu Y
Publication venue: International Speech Communication Association (ISCA)
Publication date: 22/09/2022
Field of study

While numerous studies on automatic speech recognition have been published in recent years describing data augmentation strategies based on time or frequency domain signal processing, few works exist on the artificial extensions of training data sets using purely synthetic speech data. In this work, the German KIEL corpus was augmented with synthetic data generated with the state-of-the-art articulatory synthesizer VOCALTRACTLAB. It is shown that the additional synthetic data can lead to a significantly better performance in single-phoneme recognition in certain cases, while at the same time, the performance can also decrease in other cases, depending on the degree of acoustic naturalness of the synthetic phonemes. As a result, this work can potentially guide future studies to improve the quality of articulatory synthesis via the link between synthetic speech production and automatic speech recognition

UCL Discovery

Modelling microprosodic effects can lead to an audible improvement in articulatory synthesis

Author: Birkholz P
Gerazov B
Krug PK
van Niekerk DR
Xu A
Xu Y
Publication venue: ACOUSTICAL SOC AMER AMER INST PHYSICS
Publication date: 01/08/2021
Field of study

When pitch is explicitly modelled for parametric speech synthesis, microprosodic variations of the fundamental frequency f0 are usually disregarded by current intonation models. While there are numerous studies dealing with the nature and the origin of microprosody, little research has been done on its audibility and its effect on the naturalness of synthetic speech. In this work, the influence of obstruent-related microprosodic variations on the perceived naturalness of articulatory speech synthesis was studied. A small corpus of 20 German words and sentences was re-synthesized using the state-of-the-art articulatory synthesizer VocalTractLab. The pitch contours of the real utterances were extracted and fitted with the Target-Approximation-Model. After the real microprosodic variations were removed from the obtained pitch contours, synthetic variations were applied based on a microprosody model. Subsequently, multiple stimuli with different microprosody amplitudes were synthesized and evaluated in a listening experiment. The results indicate that microprosodic variations are barely audible, but can lead to a greater perceived naturalness of the synthesized speech in certain cases

UCL Discovery

Prevention of Glaucoma-Induced Retinal Ganglion Cell Loss Using Alpha7 nAChR Agonists

Author: Birkholz P J
Gossman C A
Linn Cindy L
Linn David M.
Webster M K
Publication venue: ScholarWorks@GVSU
Publication date: 30/03/2016
Field of study

In this study, the neuroprotective effect of various nicotinic alpha7 acetylcholine receptor agonists in an in-vivo model of glaucoma using adult Long Evans rats was analyzed. Glaucoma-like conditions were induced in the eyes of Long Evans rats after injection of hypertonic saline into episcleral veins to create scar tissue and increase the animal’s intraocular pressure. This procedure produced significant loss of retinal ganglion cells within one month and was associated with an increase of intraocular pressure. Using this model system, various alpha7 nicotinic acetylcholine receptor (a7 nAChR) agonists were applied at different doses as eye drops to the right eye of adult Long Evans rats while the left eye was left as an internal control. The a7 nAChR agonists used in this study prevented loss of RGCs in a dose dependent manner after the procedure to induce glaucoma-like conditions. PHA-543613 and PNU- 282987 provided the largest degree of RGC survival after inducing glaucomalike conditions, followed by nicotine, SEN 12333, tropisetron, 3-Bromocytisine and DMAB. To provide evidence that neuroprotection of RGCs was mediated through activation of a7 nAChR, in some studies different concentrations of the a7 nAChR antagonist, MLA, was intravitreally injected into experimentally treated eyes before initiation of eye drops and the procedure to induce glaucoma-like conditions. In the presence of MLA, RGC neuroprotection was blocked. Results from these studies suggest that selective a7 nAChR agonists may be used in future therapeutic treatments for glaucoma or other CNS diseases associated with a7 nAChRs

Scholarworks@GVSU

Model-based exploration of linking between vowel articulatory space and acoustic space

Author: Birkholz P
Gerazov B
Krug PK
Prom-On S
van Niekerk D
Xu A
Xu Y
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 03/09/2021
Field of study

While the acoustic vowel space has been extensively studied in previous research, little is known about the high-dimensional articulatory space of vowels. The articulatory imaging techniques are limited to tracking only a few key articulators, leaving the rest of the articulators unmonitored. In the present study, we attempted to develop a detailed articulatory space obtained by training a 3D articulatory synthesizer to learn eleven British English vowels. An analysis-by-synthesis strategy was used to acoustically optimize vocal tract parameters that represent twenty articulatory dimensions. The results show that tongue height and retraction, larynx location and lip roundness are the most perceptually distinctive articulatory dimensions. Yet, even for these dimensions, there is a fair amount of articulatory overlap between vowels, unlike the fine-grained acoustic space. This method opens up the possibility of using modelling to investigate the link between speech production and perception

UCL Discovery

Modell einer Frauenstimme für die artikulatorische Sprachsynthese mit VocalTractLab Studientexte zur Sprachkommunikation

Author: Birkholz P.
Drechsel S.
Frahm J.
Gao Y.
Publication venue
Publication date: 01/01/2019
Field of study

Für das artikulatorische Sprachsynthesesystem VocalTractLab, das inder veröffentlichten Version auf dem geometrischen Modell eines männlichen Vokaltrakts basiert, wird das Modell für eine Frauenstimme vorgestellt. Anhand von MRT-Aufnahmen, Kieferabdrücken und Sprachaufnahmen einer ausgebildeten Sprecherinwurden die anatomischen Parameter für den weiblichen Vokaltrakt bestimmt und dieZielformen der Einzellaute sowie der glottalen Gesten angepasst. Die Sprachsynthese direkt aus Text oder einer phonetischen Transkription erfolgt mit VocalTractLab derzeit noch nicht automatisch. Die Schritte zur Erstellung von gestischen Partiturenwerden beschrieben und die Ergebnisse einer ersten Hörerbefragung zur Qualität dersynthetischen Frauenstimme präsentiert

MPG.PuRe

Geschwächte Welle-Nabe-Pressverbindung

Author: Birkholz H.
Dietz P.
Grünendick T.
Schäfer G.
Publication venue
Publication date: 01/01/2004
Field of study

Publikationsserver der Technischen Universität Clausthal